Skip to content

BUG: Comparisons result in different dtypes for empty DataFrames #15077 #15115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

mralgos
Copy link
Contributor

@mralgos mralgos commented Jan 12, 2017

@codecov-io
Copy link

codecov-io commented Jan 12, 2017

Current coverage is 85.67% (diff: 100%)

Merging #15115 into master will decrease coverage by <.01%

@@             master     #15115   diff @@
==========================================
  Files           145        145          
  Lines         51419      51417     -2   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          44057      44054     -3   
- Misses         7362       7363     +1   
  Partials          0          0          

Powered by Codecov. Last update 97fd744...a5ca359

@TomAugspurger
Copy link
Contributor

Could you add the examples from #15077 as a test cases? Also a release not in whatsnew/0.20.txt. Thanks.

@TomAugspurger TomAugspurger added the Dtype Conversions Unexpected or buggy dtype conversions label Jan 12, 2017
@jreback jreback added the Bug label Jan 12, 2017
@mralgos
Copy link
Contributor Author

mralgos commented Jan 12, 2017

Yep. Is tests/frame/test_operators.py the right place for the test?

@mralgos
Copy link
Contributor Author

mralgos commented Jan 12, 2017

I've added the test and a line in whatsnew/0.20.0.txt. Build test in progress.

@@ -671,6 +671,28 @@ def _test_seq(df, idx_ser, col_ser):
exp = DataFrame({'col': [False, True, False]})
assert_frame_equal(result, exp)

def test_return_dtypes_bool_op_costant(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you replicate these tests for series as well (or point to that we have equiv tests)

@@ -362,3 +362,4 @@ Bug Fixes


- Bug in ``pd.read_csv()`` for the C engine where ``usecols`` were being indexed incorrectly with ``parse_dates`` (:issue:`14792`)
- Bug in dtypes returned by comparison methods (e.g., ``lt``, ``gt``, ...) against a constant for an empty DataFrame (:issue:`15077`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say that this did not previously return a boolean Series result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for reviewing it. I thought it was enough to link the issue for more details.

empty = df.iloc[:0]
for op in ['eq', 'ne', 'gt', 'lt', 'ge', 'le']:
f = getattr(empty, op)
for col in ['x', 'y']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this comparison, use assert_series_equal and compare against Series([2], ['bool']
where you do result = getattr(empty, op)(const).get_dtype_counts()
e.g

In [5]: df = DataFrame({'x': [1, 2, 3], 'y': [1., 2., 3.]})

In [6]: df.lt(2)
Out[6]: 
       x      y
0   True   True
1  False  False
2  False  False

In [7]: df.lt(2).dtypes
Out[7]: 
x    bool
y    bool
dtype: object

In [8]: df.lt(2).dtypes.values
Out[8]: array([dtype('bool'), dtype('bool')], dtype=object)

In [9]: df.lt(2).get_dtype_counts()
Out[9]: 
bool    2
dtype: int64

In [10]: Series([2], ['bool'])
Out[10]: 
bool    2
dtype: int64

for col in ['x', 'y']:
res = f(const)
c = getattr(res, col)
self.assertEqual(c.dtypes, bool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

s = pd.Series([1, 3, 2], index=range(3))
for op in ['eq', 'ne', 'gt', 'lt', 'ge', 'le']:
f = getattr(s, op)
self.assertFalse(f(2).dtypes, np.dtype('bool'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same type of comparison here and below

@@ -382,5 +382,4 @@ Bug Fixes

- Bug in ``Series`` constructor when both ``copy=True`` and ``dtype`` arguments are provided (:issue:`15125`)
- Bug in ``pd.read_csv()`` for the C engine where ``usecols`` were being indexed incorrectly with ``parse_dates`` (:issue:`14792`)

- Bug in ``Series.dt.round`` inconsistent behaviour on NAT's with different arguments (:issue:`14940`)
- Not boolean Series returned by comparison methods (e.g., ``lt``, ``gt``, ...) against a constant for an empty DataFrame (:issue:`15077`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say

Incorrect dtyped Series was returned by comparions methods.........against a constant for an empty DataFrame

put Series/DataFrame with double-backticks.

@jreback jreback added this to the 0.20.0 milestone Jan 21, 2017
@jreback
Copy link
Contributor

jreback commented Jan 21, 2017

thanks!

@jreback jreback closed this in 4515be9 Jan 21, 2017
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
…as-dev#15077

closes pandas-dev#15077

Author: Giacomo Ferroni <[email protected]>
Author: Giacomo Ferroni <[email protected]>
Author: mralgos <[email protected]>

Closes pandas-dev#15115 from mralgos/gh15077 and squashes the following commits:

a5ca359 [mralgos] Merge branch 'master' into gh15077
dc0803b [Giacomo Ferroni] Merge branch 'master' into gh15077
b2f2d1e [Giacomo Ferroni] Merge branch 'gh15077' of https://github.com/mralgos/pandas into gh15077
fcbcb5b [Giacomo Ferroni] Apply review changes
9723c5d [Giacomo Ferroni] Merge branch 'master' into gh15077
eb7d9fd [Giacomo Ferroni] Delete blank lines
28437bb [Giacomo Ferroni] Check for bool dtype return added also for Series. Minor update to whatsnew
19296f1 [Giacomo Ferroni] Added test for gh15077 (cf. gh15115) and whatsnew note
ea11867 [Giacomo Ferroni] [gh15077] Bugfix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Comparisons result in different dtypes for empty DataFrames
4 participants